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A common first step in time series signal analysis involves digitally filtering the data to remove linear correlations. 
The residual data is spectrally white (it is "bleached"), but in principle retains the nonlinear structure of the 
original time series. It is well known that simple linear autocorrelation can give rise to spurious results in 
algorithms for estimating nonlinear invariants, such as fractal dimension and Lyapunov exponents. In theory, 
bleached data avoids these pitfalls. But in practice, bleaching obscures the underlying deterministic structure of 
a low-dimensional chaotic process. This appears to be a property of the chaos itself, since nonchaotic data are 
not similarly affected. The adverse effects of bleaching are demonstrated in a series of numerical experiments on 
known chaotic data. Some theoretical aspects are also discussed. 



INTRODUCTION 



On Much of the current interest in nonlinear signal processing arises 
not so much as an extension of linear analysis, but from the recog- 
nition that an entirely new idea - chaos - will play a significant 
f^^)role. In some cases, this entirely new idea has led to entirely new 
^ techniques for time series analysis. These have provided experi- 

amentalists with new ways to understand the implications of their 
data, though the limitations of these new technologies have not 



always been understood or well appreciated. In other cases, chaos 



has shed new light on the interpretation of conventional time series 
^ analysis tools (for instance, by providing a deterministic explana- 
tion for broadband spectra). Our intent here is to investigate the 
I'TN limitations of one of these conventional tools in the context of 
chaotic time series. 

Bleaching, or "pre-whitening," is the process of linearly filter- 
ing time series data to remove autocorrelation — that is, to make 
the power spectrum more nearly flat, or "white." As a first step in 
time series analysis, it is a time-honored practice among statisti- 
cians [1-4] and statistics-minded economists [5-10] . Even the clas- 
sic treatise of Blackman and Tukey [11] recommends "preempha- 
sis" of a signal to make the spectrum "more nearly constant." It is 
an initially attractive procedure because it eliminates autocorrela- 
tion, which is one of the major sources of artifact in nonlinear time 
series analysis [12-16]. Further, since bleaching is accomplished 
with a finite order non-recursive (or finite-impulse-response, or 
FIR) filter, it can be proven that the nonlinear properties (such 
as dimension and Lyapunov exponent) remain invariant [6, 17-21]. 

However, this theoretical invariance does not always carry over 
to practical data analysis. It has long been known that recursive 
(or infinite-impulse-response, or IIR) filters can — in practice and 
in principle — change the character of a nonlinear process, as in- 
ferred from its time series [22, 23]. Mitschke [24] suggested that 



acausal IIR filters might be less destructive, though others [16, 
21] have shown that these too can change the nonlinear invari- 
ants. Insofar as FIR filters approximate IIR filters, their effects 
can be similarly detrimental: a graphic demonstration is provided 
in Ref. [19]. In an earlier paper [25], we briefly noted that bleaching 
with very high order linear filters can degrade evidence for nonlin- 
earity in a time series. In this paper, that observation is extended. 
Even when the bleaching is constrained to relatively low order (by 
the Akaike criterion, for instance), and even for tasks other than 
detecting nonlinear structure, we find that the effect of bleaching 
on chaotic data can be detrimental. On the other hand, bleaching 
nonchaotic data does not have such a negative effect. 

After introducing the bleaching process in Sect. II, the effect 
of bleaching on chaotic data is demonstrated numerically, first by 
looking at the problem of nonlinear prediction (in Sect. Ill), then 
by comparing residual-based to surrogate data approaches for de- 
tecting nonlinearity in time series (in Sect. IV). These numerical 
results lead us to argue against pre-whitening chaotic data; how- 
ever, in Sect. V, this view is tempered by showing that some linear 
prefiltering can still be advantageous. The emphasis in this paper 
is on numerical results, but in Sect. VI some theoretical issues are 
discussed: the limit of infinite data with infinite order filtering; and 
the relation of filtering to the more familar problem of "optimal" 
embedding. 



II BLEACHING 

Given a time series xt , 
model 



the best linear predictor Xt is given by the 



Xt 



1} 

+ ^ akXt- 

k=l 



(1) 



for which Xt most closely approximates xt in the least-squares 
sense. The residuals (also called "innovations" or "disturbances") 
et = Xt — Xt measure how much of the original time series is not 
linearly predictable from the past. That is, 



et = xt 



+ '^^akXt-k 



(2) 



where q is the order of the model, and the linear coefficients Ofe are 
obtained by a least-squares fit which minimizes the variance of the 
residuals. 

A result from the theory of linear time scries analysis states 
that in the large q limit, the residuals et obtained by subtracting 
from Xt the best linear predictor xt will be uncorrelated; that is, 
the residuals e* are spectrally white (the Appendix outlines an 
informal proof). 
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FIG. 1. Effect of bleaching on a time series derived from the x 
values of the Henon map. (a) The unfiltered data corresponds to 
9 = 0. (b) A 5 = 6 filter distorts the attractor considerably, and 
hides the determinism that is evident in the raw data. 
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FIG. 2. Four measures of goodness of fit are plotted as a function of 
q for the Henon map with N = 1024 points. The in-sample fitting 
error (□) decreases monotonically with increasing q because there 
is no penalty for more parameters and no guard against overfitting. 
The out-of-sample fitting error (x) and the Akaike Information 
Criterion (o) both show a leveling-off at <? = 6, while the Schwarz 
criterion (o) indicates a definite minimum at g = 6, suggesting that 
an order 6 fit is optimal with this many data points. The Akaike 
curve is log a + q/N where a is the in-sample rms fitting error, and 
the Schwarz curve is logcr -|- {q\ogN)/2N. 



While the fit is based on the best auto-regressive (AR) model, 
the linear map that takes xt to et in Eq. (2) is a moving-average 
(MA) filter; that is, it is a nonrecursive, finite order, or finite- 
impulse-response (FIR), filter. Strictly speaking, it will not change 
the structure of the attractor for firutc q [6, 17-20]. For example, 
if Xt lies on a strange attractor, then et will lie on an attractor 
of the same dimension. This is not true of an AR filter, which 
can increase the dimension of the attractor [22, 23]. Actually, it 
is possible for a nongcneric MA filter to reduce the dimension, by 
"undoing" an AR filter's increase [18, 19]. 

However, as Fig. 1 shows, the effect of bleaching the Henon at- 
tractor [26] is to distort the attractor considerably, and to make its 
low-dimensionality much less evident. The order of the model, q, is 
generally chosen by some criterion which trades off the variance of 
the residuals (in-sample error of fit) against a penalty for number 
of parameters. In Fig. 2, we plot Akaike's information criterion 
(AIC) [27], Schwarz's criterion [28], and out-of-sample error as a 
function of q, and show that g = 6 is a good choice for the Henon 
map with A'' = 1024 points. 

Ill NONLINEAR MODELING 

A very direct measure of determinism in a time series is the accu- 
racy of a nonlinear predictor. We performed a numerical experi- 
ment that involved modeling the Henon attractor with a nonlinear 
predictor based on local- linear fits to the k nearest neighbors [29]. 
The results are shown in Fig. 3. The time series contains A'^ = 1024 
points, half of which are used for learning the nonlinear map, and 
the other half for testing the goodness of the model. We used 
k = 2m, where m is the embedding dimension of the model. In 
general, increasing the embedding dimension (up to m = 3) im- 
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proves the prediction, but increasing q degrades the prediction. 
Nonhnear prediction of fully bleached data leads to errors that are 
in this case two orders of magnitude larger than errors obtained 
by directly fitting the raw data. 

Note that for both the raw data and for the residuals, an em- 
bedding dimension of m = 3 is in principle adequate, since the 
fractal dimension is approximately d « 1.3 [30], and a theorem of 
Sauer et al. [17] states that as long as m > 2d, the embedding will 
almost always be sufficient. 




FIG. 3. Modeling error of a nonlinear predictor on a time series 

generated by the Hcnon map. For q — 0, the raw data set is used. 
For g > 0, the g-th order residuals (as computed by Eq. (2)) are 
used. The top (m = 0) curve corresponds to the amplitude of 
the order-g residuals; these decrease with increasing g. The curve 
below that is from an m = 1 model, and below that is m = 2. The 
curves for m = 3, 4, 5 are essentially the same. (The error bars are 
based on five independent runs with different realizations of the 
data.) 

We remark that fitting the residuals is different from a common 
two-step approach to fitting data that first fits a linear model, then 
fits a nonlinear model to what is left. To make this distinction 
clearer, let us write xt-i = (xt-i, ■ ■ ■ ,xt-m.)- The best linear 
model to the time series is xt = C{xt-i) with C chosen to minimize 
the variance of the residuals et — Xt — £{xt-i). 

Nonlinear modeling of the residuals in terms of the actual past 
time series Xt-i, that is 



it = A/'(£t-i), 



(3) 



permits a a full nonlinear model of the original time series: xt = 
{C+M){xt-i). 

By contrast, nordinear modeling of the bleached time scries 
means finding a nonlinear map M' which estimates et from past 
residuals St-i- 

et=N'{et-i) (4) 

Combining C and N' into a full model for the original time series 
is possible, but far from natural. Furthermore, for chaotic data, 
we find that the estimation errors obtained with M' are generally 
larger than those of the more direct M. 



Quasiperiodic data. The case against bleaching depends on the 
time series being chaotic. When applied to quasiperiodic data, the 
ill effects of bleaching are not evident. 

For our numerical experiment, we deliberately chose an exam- 
ple that was more complicated than the sum of two sine waves. The 
quasiperiodic data were generated by a nonlinear two-frequency 
model with observational and dynamical noise: 



Xt 



Xi.t + X2,t + 1 



(5) 



where Xi^t = sin(0j -|- (27r/5)7it -|- rji^t) + Here, the two 

mutually incommensurate frequencies are 71 = (\/5 - l)/2 and 
72 = \/3 — 1. Observational noise is modeled with e, a Gaussian 
white noise process with unit variance; and dynamical noise is 
modeled as a random- walk phase drift: rjt = 774-1 + O.let, whore 
et is again Gaussian white noise with unit variance. Finally, 4> is 
a randomly chosen initial phase. Five time series were generated, 
using different starting phases (j) and a transient time A^transicnt = 
512. The time series themselves were of length N — 512. Each was 
modeled by a local linear map with various embedding dimensions 
m and bleaching parameters g. We used the k = 2m nearest 
neighbors from the first half of the data set for one-step-ahead 
predictions on the second half of the data set, and computed the 
median absolute error. As seen in Fig. 4, unlike the case with 
chaotic data, bleaching does not have such a debilitating effect on 
the modeling. 

IV DETECTING NONLINEARITY 

In this section, we will describe how bleaching influences statistical 
tests for nonlincarity. The motivation behind a test for nonlinear- 
ity is sometimes simply to determine whether a linear model will 
capture all of the structure in the time series. Often, however, 
there is a hidden agenda. One may seek to detect nonlinearity as a 
first step in what is ultimately a search for chaos. Nonlinearity is 
certainly a pre-requisite for chcios, but it is not the most straight- 
forward way to test for chaos. A more direct approach might be 
to estimate the largest Lyapunov exponent. A positive Lyapunov 
exponent implies chaos, so a positive estimate would be taken as 
direct evidence in favor of chaos. The main problem with this ap- 
proach is that the estimation of Lyapunov exponent is a nontrivial 
procedure [31, 32], and it is difficult to quantify the reliability of 
the estimate. Testing only for nonlincarity may not be as direct a 
test for chaos (the disadvantage being that a positive identification 
of nonlinearity docs not imply chaos) , but it can be done far more 
reliably than trying to compute a Lyapunov exponent. 

Bleaching provides a conceptually simple approach to testing 
for nonlinearity in time series. Since the residuals of a bleached 
time series have no linear correlations, any correlations that are 
found in the residuals must be nonlinear. In particular, testing the 
residuals against IID (independent and identically distributed) is 
equivalent to testing the original time series for nonlinearity. 

This is the basis of the Brock-Dcchcrt Slicinkman (BDS) 
test [5] (see Ref. [7] for a recent and more complete exposition), the 
tests for chaos described by Hsieh [9] (though with the financial 
time series of interest here, there is little autocorrelation to begin 
with [8]), a neural-net-based test for "neglected nonlinearity" [10], 
as well as a variety of classical nonlinearity tests [1, 2], many of 
which are reviewed in Tong [4]. To be fair, not all of these tests 
were designed with the idea of looking for chaos. Our point is that 
those tests which have bleaching as their first step will have low 
power when the test data is chaotic. We should also be careful to 
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FIG. 4. Similar to Fig. 3, but instead of using chaotic data, we 
use (a) noise-free, and (b) noisy quasipcriodic data. Note that in 
contrast to the case of chaotic data, the effect of bleaching is not 
to degrade the accuracy of a nonlinear model, but on the contrary 
to improve it. Embedding dimensions shown are m = (dotted 
line); m = 1 (short dashed line); m = 2 (dashed-dotted line); 
m = 3 (long dashed line); and m = 4, 5, 6, 8, 10, 12, 14, 16, 18, 20 
(solid lines). In general, the larger the m, the smaller the error 
(except the m = curve, which is actually more accurate than the 
m = 1 curve) . 



note that the test proposed by Tsay [3], though it involves residu- 
als, also makes use of the original data. It has the flavor of Eq. (3) 
as opposed to Eq. (4), and unlike purely residual-based statistics, 
it may not suffer the same loss of power against chaotic time series. 

Instead of comparing residuals to IID, a more direct approach 
is to compare the original data to surrogate data sets which mimic 
the linear correlations in the original time series, but which are 
otherwise random [25, 33-36]; in the statistical literature, the ap- 
proach is often identifiod as a bootstrap. There is some discussion 
of the connection between the surrogate data approach and the 
classic bootstrap in Refs. [36, 37]; the interested reader should also 
consult Refs. [34, 38] for pointers into the relevant literature. 

A discriminating statistic (which for chaotic processes is often 
chosen to be a dimension or Lyapunov exponent estimator, or the 
error in a nonlinear predictor, but in general can be any function 
that maps a full time series into a single number) is computed for 
each of the surrogates and for the original data set. If the number 
obtained for the original data set is significantly different from 
those obtained for the surrogate data sets, then a rmll hypothesis 
of linearly correlated noise can be rejected. A crude (and cheaply 
computed) measure of how significantly different the original is 
from the surrogate data is given by the number of "sigmas" : 



sigmas = 



surrogate 



original I 



(6) 



Here Qoriginai is the value of the discriminating statistic for the 
original data set, and Qsurrogate ^'id (Tsurrogate are the mean and 
standard deviation, respectively, of the discriminating statistics 
computed for the surrogate data sets. We remark that this is a 
heuristic measure. Properly, one should compute the probability 
(also called the p-value) of mis-identifying a linear time series as 
nonlinear. One way to estimate p is from the percentile ranking 
of Qoriginai in & sorted list of all the Q values. Only when the 
Q statistic has a distribution of some previously assumed form 
(usually Gaussian) can the p-value can be computed directly from 
the number of sigmas. In general, though, the more sigmas, the 
smaller the p-value, and the more powerful the statistic. We will 
be using sigmas as an inexpensive measure of relative power. 

Formally, the method of surrogate data provides a measure of 
statistical confidence that the null hypothesis is false; informally, 
it can be used as a control experiment to assess whether the mear 
surement of a given nonlinear property is being fooled by simple 
linear correlation in the time scries. 

Our approach will be to compare the power of different tests 
for nonlinearity when the form of that nonlinearity is chaos. In 
statistical terminology, the null hypothesis is linearly correlated 
noise, and the alternative hypothesis is chaos. If the alternative 
hypothesis is a specific chaotic process, one can imagine designing 
very sensitive tests for distinguishing this process from the null. 
For the broad class of chaotic processes (and especially for the 
even broader class of nonlineax processes that may or may not 
be chaotic), the notion of an optimal design ceases to be well- 
posed. The emphasis here, however, will not be on finding the 
most powerful tests for nonlinearity; instead we will concentrate on 
the simpler question of how bleaching affects the power of existing 
tests when the alternative is chaos. 

In the numerical experiment shown in Fig. 5, significance was 
computed for a variety of discriminating statistics on a chaotic 
time series and on time series obtained by bleaching with ever 
larger values of q. By and large, the significance was found to 
decrease with increasing q. Wo also performed some experiments 
with quasiperiodic data (not shown), and we found that bleaching 
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did not noticeably alter the ability of the surrogate data method 
to detect nonlinearity. We remark, however, that attempting to 
distinguish nonlinearity in quasiperiodic data is a very fussy issue. 
Stable limit cycles and limit tori arise only in nonlinear systems, 
yet the absence of chaos implies that linear models (of sufficiently 
high order) can in principle do as well as nonlinear models. This 
issue is discussed in further detail in Ref. [37]. 

In the method of surrogate data, just about any nonlinear 
statistic can bo used. For example, wo have found a very sim- 
ple measure of nonlinearity that is motivated by the fact that lin- 
ear time series have symmetric rise and fall times; the asymme- 
try in the derivative can be measured by a simple skew statistic, 
{{xt — Xt-i)^). For the experiment in Fig. 5, it is this statistic 
which wc found most sensitive to the nonlinear structure in the 
time series. (Sec Tsay [34] for further discussion of this statistic.) 




FIG. 5. Significance of rejection of a null hypothesis of linearly 
correlated noise versus bleaching parameter q for a variety of dis- 
criminating statistics: modified BDS (©), a simple skew statistic 
(®), estimated correlation dimension (o), the correlation integral 
itself (□), local linear forecasting error (o), and modified McLeod- 
Li (x). For the experiment in this figure, we used a time series of 
N = 1024 points obtained by summing four independent realiza- 
tions of time series from the Henon map. The dimension, BDS, 
and forecasting statistic used an embedding dimension of m = 3. 
Although this is clearly too small to see the full dynamics in the 
time series, for the purpose of finding evidence for nonlinearity 
from a series of this length, the value m = 3 was empirically found 
to give the most significance (the skew and modified McLeod-Li 
statistics do not require an embedding). All of the discriminating 
statistics (except the modified McLeod-Li) show evidence of non- 
linearity at the three sigma level for unbleached data (g = 0), and 
all of them fail to show evidence of nonlinearity at the three sigma 
level for the fully bleached data (g > 6). 



IV.A COMPARISON TO BDS 

Brock, Dechert, and Scheinkman [5] developed a statistic to test for 
nonlinearity based on the correlation integral of Grassberger and 

Procaccia [39]. This is, to our knowledge, the first statistically 
rigorous test to exploit the "new paradigm" of deterministic chaos 



as an alternative hypothesis. To define the BDS statistic for a 
time series of A'' points, first define Cm{N,r) as the m-dimensional 
correlation integral 

JV i-l m-l 

Cm{N,r) = I] I] n - 1**+'^ - ^^+^1) 

= 1 j=l k=0 



N{N - 1) 



where Q{x) is the Heaviside step function; it is one for positive x, 
and zero otherwise. For IID data, in the limit N — > oo, one expects 
C™(A,r) « Ci(A,r)'". In particular, the BDS statistic 

Qbds = Vn [Cm{N, r) - Ci{N, r)""] (8) 

will for IID data converge to a normal distribution with zero mean 
and fixed variance. The variance can be estimated from the data, 
but for our purposes, we find it convenient to estimate the variance 
using Monte-Carlo sinmlation. 

In particular, we use Qbds as the discriminating statistic in the 
scheme of surrogate data. We find that as a discriminating statis- 
tic, it is quite powerful. However, when it is applied to bleached 
data, it loses its original power. We suggest therefore that the 
BDS statistic should not be applied to residuals and compared 
against IID, but instead should be applied to the original data and 
compared against the appropriate surrogates (see Fig. 6). 

IV.B COMPARISON TO MCLEOD-LI 

One of the most straightforward conventional approaches to testing 
for linearity in a time series is to look at the autocorrelation of the 
squared residuals. If the residuals truly are IID, then their squares 
will be IID, and therefore, the squares will have zero autocorrela- 
tion. In particular, the statistic based on sample autocorrelation 
of the squares 



Qml = A(A + 2)^^^ 



where 



rk = 



(9) 



(10) 



is the autocorrelation of the squared time series, will for IID data 
converge as A^ — > oo to a well-defined distribution. This is a par- 
ticular case of the McLeod-Li [1] statistic [40]. As in the case of 
the BDS statistic, we can apply the statistic to unbleached data by 
simply using xt — {xt) in place of et in the above formula. However, 
at least for the numerical experiment in Fig. 5, we found that this 
statistic was the weakest of our tests for nonlinearity. 

Further, the McLcod-Li statistic seems to improve when the 
data set is bleached. This can bo understood intuitively by real- 
izing that the autocorrelation of the squared time series involves 
very large values (and therefore, very large variances). One natural 
way to reduce these values is with the following modification: 



Qmml = N{N + 2)Y^ JTZTk^'"' " ^fe) 



(11) 



where rt is the autocorrelation in the squared time series (as be- 
fore), and Ak is the autocorrelation of the original time series. 



(xtXt-k) - (xt)^ 



(12) 



The idea is to "subtract off" that much of the autocorrelation 
of the squares which can be attributed to the autocorrelation in 
the original time series. Fig. 7 shows that the new statistic is more 
powerful when used with surrogate data; and for the data set under 
consideration, is optimal for a small value of q. 
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FIG. 6. Significance of the BDS statistic as a function of bleaching 
q. The time series is N = 1024 points obtained by adding four 
independent realizations of Henon time series. As with the single 
Henon time scries, full bleaching occurs at g = 6. The circle (o) 
curve uses BDS to test against a null of IID noise. Not surpris- 
ingly, the null is easily rejected for unbleached data, because there 
arc both linear and nonlinear correlations, and the test doesn't 
distinguish them. However, applying the test to bleached data, 
we find little evidence to reject the null of IID residuals. On the 
other hand, the square (□) curve uses the BDS statistic as part 
of a surrogate data algorithm to test directly against the null hy- 
pothesis of linearly correlated noise. This test is less significant 
at g = 0, but that's because it is testing against a more general 
null. It too loses significance as q increases. But what should be 
compared here is the q = square (■) point, and the q = 6 circle 
(•) point; the former is significant at the five sigma level, while 
the latter is not significant. The former uses the BDS to test the 
raw time series against a null of linearly correlated noise; the latter 
uses BDS (as it was originally intended) to test residuals against 
IID; though the two tests are formally equivalent, the direct test 
that avoids bleaching is the more powerful. 
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FIG. 7. Significance of nonlinearity for a time series of A'^ = 
1024 points obtained by summing four realizations of the Henon 
map, using the McLeod-Li statistic (o) and a modified version of 
McLeod-Li (□) described in Eq. (11). 

V SOME GENERAL REMARKS ON LINEAR FILTERING 

In the case of the Henon attractor, bleaching is found to be detri- 
mental both to nonlinear modeling and to detecting nonlinearity. 
But it would be incorrect to assume that all linear filtering is in all 
cases bad. Given a particular data set, and a particular nonlinear 
task, one expects that there is a particular linear prefiltering that 
will optimize the performance at the given task. The theme of 
this article is that the particular linear filter that corresponds to 
bleaching is rarely optimal, and usually makes things worse. 

In this section, we will give two examples of situations that arise 
frequently in practice. In both cases, linear prefiltering is seen to be 
advantageous, but in neither case is full bleaching recommended. 

V.A UNFILTERING FILTERED DATA 

A natural example is to begin with a known chaotic time series, 
and then to low-pass filter the data, so as to introduce a lot of 
linear correlation in the data. For example, if ht is a chaotic time 
series, and \a\ < 1, then 

xt = axt-i + ht (13) 

gives a time series xt which for a near 1 is dominated by the linear 
component [41]. While this example may appear at first sight 
contrived, it represents a very common physical occurrence: the 
observation of a natural phenomenon through a low-pass filter. For 
instance, a resistance R and capacitance C between the probe and 
the phenomenon being measured leads to a characteristic time of 
RC, and corresponds to a = e~^^^''' in Eq. (13). This is certainly 
the situation for the example of scalp-based measurements of brain 
electrical activity, as in the electroencephalogram (EEG). 

In this case, it can be advantageous to digitally filter the ob- 
served time series to counteract the effect of the filter through 
which the data were observed. However, it is still not recommended 
to fully bleach the data! In particular. Fig. 8 shows that for a sum 
of four Henon time series, prefiltered with a = 0.9, the optimum 
amount of bleaching is given at g = 1 or 2. However, from the 
point of view of linear modeling, g = 7 is the "proper" amount of 
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bleaching for this time scries (based, as in Fig. 2, on AIC, Schwartz, 
and out-of-sample error criteria). At 5 = 7, the significance of the 
evidence for nonlinearity is negUgible. The evidence at <? = is 
not very significant (depending on the discriminating statistic) , so 
there is a real advantage to a "little" bleaching to remove a domi- 
nating linear component. 
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FIG. 9. Bleaching oversampled data; the Lorenz time series was 
sampled at a rate At = 0.02, and residuals were computed for 
q = 1, 2, 3. While bleaching significantly reduced the magnitude of 
the residuals, it produced in its place a very "spikey" time series 
that is more more difficult to analyze than the raw data. 



FIG. 8. Here, the chaotic time series is obtained by AR filtering 

(a = 0.9) a time series of four independent Hcnon maps summed 
together. The time series is bleached at several values of q. As 
before, the embedding dimension is m = 3. Some bleaching (at 
g = 1 or 2) leads to significant evidence for nonlinearity, but full 
bleaching (at q = 7 for this time series) gives time series with no 
detectible nonlinear structure. Here, the discriminating statistics 
used were: modified BDS (©), skew (®), correlation dimension (o), 
correlation integral (□), and local linear forecasting error {o). 

In a more practical situation, if one is seeking evidence for 
nonlinearity in a time series of sea levels, it can be advantageous 
to "filter out" the daily and monthly tides which dominate the 
variations [42]. We also note that Townshend [43] reported im- 
proved modeling of speech signals after linear filtering; we suspect 
that this is due to the dominant underlying periodicity of these 
signals. 

V.B BLEACHING OVERSAMPLED DATA 

Data which are sampled at a much higher rate than that of the 
underlying physical process will have very little power in the high 
frequencies. Since the efi'ect of bleaching is to achieve equal power 
at all frequencies, the effect on oversampled data is to grossly am- 
plify the high frequency behavior. 

For noise-free oversampled data, the residuals will have very 
small amplitude compared to the original data, but the enhance- 
ment of the high frequencies will lead to very irregular and "spikey" 
dynamics. Fig. 9 shows this effect with the Lorenz attractor [44]; 
data from the Rossler attractor [45] , which has a more pronounced 
periodicity, shows the effect even more severely. 

For a time series which is oversampled from a continuous flow 
but whose measurement is contaminated with uncorrelated addi- 
tive noise, the effect of bleaching is to amplify the noise. Because 
this situation is so common in physics experiments, it is some- 
times difficult for physicists to imagine why one would ever want 



to bleach data in the first place. The physicist's intuition in this 
case is absolutely correct. This is an example where it is not only 
unwise to bleach the data, but it is often helpful to filter the data 
with a low-pass filter, making it less white than the original signal. 

VI THEORETICAL CONSIDERATIONS 

We have emphasized numerical experiments in this exposition, 
partly because these provide graphic demonstrations of the phe- 
nomena, but also because we do not have a good "theory" of why 
bleaching should be so detrimental to so many different aspects 
of nonlinear time series analysis. Intuitively, the linear filtering 
replaces the current state with a linear combination of states at 
previous times, and the effect of this combination is to confuse the 
meaning of the current state; this intuition is made more precise 
in the following section. 

VI.A LIMIT OF INFINITE DATA 

When an infinite amount of data is available, then conditions such 
as those plotted in Fig. 2 do not put a cap on the order of the 
bleaching filter. That is, one may may have g ^ cx) in Eq. (2). This 
is no longer a finite-impulse-response (FIR) filter, but is an infinite- 
impulse-response (HR) filter, and so the theorems of Refs. [6, 17-20] 
no longer apply. The filtered time series is no longer guaranteed 
to preserve the nonlinear invariants, such as attractor dimension, 
of the original time series. In this section, we describe conditions 
under which a particular invariant, the Lyapunov dimension, is 
altered. We speculate that these conditions will apply to more 
general invariants as well. 

The Lyapunov dimension was defined by Kaplan and Yorke [46] 
as part of a conjecture that related Lyapunov exponents to firactal 
dimension. If Ai > A2 > ■ • ■ are the ordered Lyapunov exponents 
of a dynamical system, and k is the largest integer such that Ai -|- 
• • • + Afc > 0, then the Lyapunov dimension is given by 

D^=k+^^±-±^. (14) 
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Note that the Lyapunov dimension depends only on the largest 
fc + 1 Lyapunov exponents. 

The effect of a general (causal [47]) IIR filter is to add new 
negative Lyapunov exponents to the dynamics. This is readily seen 
in the case of the AR(1) filter. As discussed by Badii et al. [22], 
the filter 

et = het-i + xt (15) 

adds a new variable (et) to the dynamical system, and a new Lyar- 
punov exponent A = log A higher order AR(g) filter 



et 



•k^t-k + Xt 



(16) 



can be factored to give q new variables, and q new Lyapunov expo- 
nents, li z\, . . . ,Zq denote the q roots of the associated polynomial 
Q{z) = 1 — X]i=i ^9^' ~ (1 ■ ■ ■ ^/^q)i then we can write 

Eq. (16) as a system of q equations: 



3« = (lM)e« +ef> 



4"' = {l/z,)e'f},+xt 



(17) 



and the new Lyapunov exponents are given by = log(|l/zi|) = 
— log \zi\iov i = 1, . . . ,q. Note that the roots Zi must all lie outside 
the unit circle {\zi\ > 1) for the filter in Eq. (16) to be stable, and 
in this case all the new Lyapunov exponents are negative. 

If we rewrite the AR(q) filter above in terms of its equivalent 
MA(oo) filter; that is. 



then the polynomial 



et = 'y^^ttkXt-k, 

k=0 



P{z) = '^akz'' 

k=0 



(18) 



(19) 



will satisfy P{z) = 1/Q(z) and will have poles zi,...,Zq where 
Q{z) has roots. 

All of this is motivation for the following statements: The new 
Lyapunov exponents generated by an IIR filter given in Eq. (18) 
are Ai = — logj^ij where Zi are the poles of the polynomial in 
Eq. (19). If the filter is invertible and has bounded coefficients ak, 
then there will be no poles or zeros inside the unit circle. 

Now, we wish to consider the particular IIR filter that corre- 
sponds to bleaching. This is given by Eq. (2) with q = oo, and has 
the property 

(etet-r) = cr^So,T- (20) 
Let us write the causal inverse of the filter in Eq. (2) as 



Xt = y^bfcet-fc 

k=0 

where the b's are given as the coefficients of the polynomial 

1 



(21) 



Q{z) 



P{z) 



(22) 



We can exploit the condition in Eq. (20) by writing 

oo oo 

(xtXt+k) = bjbj {et-iCt+k-j) 

i=0 j=0 

oo 

= a^^6i6j+fe. (23) 



If we introduce the autocorrelation "generating function" 
(Ref. [48], Sec. 5.7.1) which is the polynomial 



(24) 



where Ak is the autocorrelation function defined in Eq. (12), it is 
not hard to show that 



Aiz) = cQ{z)Q{z~ 



(25) 



where c is a constant multiplier. Thus, if Zo is a root of Q{z), then 
both Zo and z^^ are roots of A{z). It follows that the largest new 
Lyapunov exponent introduced by the bleaching is given by 



■log|2o| 



(26) 



where Zo is the smallest root of the autocorrelation generating func- 
tion A{z) that is outside the unit circle. 

Since the Lyapunov dimension D\ depends only on the largest 
D Lyapunov exponents, where D = \Dx], a new Lyapunov expo- 
nent will change the Lyapunov dimension only if it is larger than 
Xd, the Dth largest Lyapunov exponent of the original dynamics. 
These D largest Lyapunov exponents are the only ones accessible 
from a trajectory that is on the attractor; Cenys [49] calls these 
the "internal" Lyapunov exponents. 

Therefore, a bleaching filter will change the Lyapunov dimen- 
sion whenever Ao of Eq. (26) is greater than the smallest accessible 
(or "internal") Lyapunov exponent Xd- One can think of this in 
terms of two time scales: one is the "linear" timescale assocated 
with the autocorrelation function, and the other is the "nonlin- 
ear" timescale associated with the smallest accessible Lyapunov 
exponent. When the linear timescale is longer than the nonlinear 
timescale, then bleaching will, in the infinite data limit, actually 
change the structure of the attractor. 

We have already seen, however, that even finite-order bleaching 
can have a dramatic effect on estimates of nonlinear invariants, and 
in the following section we outline an approach for quantifying that 
effect. 

VLB FILTERING, EMBEDDING, AND PROJECTING 

As noted in Refs. [17-19], the issue of prefiltering can be recast as 
an embedding problem. Given a time series xt, one can ask which 
of the following "embeddings" best describes the actual state of 
the system at time t: 



cr(a) _ 



{xt,xt-i,xt-2); OT (27) 

(et, Ct-1, et-2) 

{xt — axt^i, Xt-1 — axt-2, Xt-2 — axts). (28) 



And in fact, both of these are projections from the higher di- 
mensional space: {xt,xt-i,xt-2,Xt-3). Indeed, the two panels in 
Fig. 1 can be viewed as two different projections from the eight- 
dimensional space {xt,xt-i, ■ ■ ■ ,xt-7). Thus the twin issues of 
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optimal filtering and optimal embedding can both be rephrased in 
terms of optimal projection. 

There are a number of criteria for judging the quality of an 
embedding. Operational criteria would define the fitness of an 
embedding in terms of how well it permits nonlinear forecasting 
or dimension estimation. More direct criteria have also been pro- 
posed [50-53]. 

In particular, the approach suggested by Casdagli et al. [50] 
compares different embeddings according to how measurement 
noise is amplified when the embedded state is mapped back to the 
original state space. The authors define a "distortion" 5 which is 
related to this amplification. In this section, we will measure S for 
bleached and unbleached data. We will also introduce a new quan- 
tity, 7, which we will call "stretching;" this measures how much a 
spherical (infinitesimal) noise ball in the original state space will 
be stretched in going to the embedded space. This new quantity, 
though also a local quantity (by which we mean it does not depend 
on global information in the attractor, such as how the attractor 
is "folded" by the dynamics) , provides complimentary information 
about the embedding. 

Following Casdagli et al. [50], let $ be the map that takes the 
actual state into the time delay embedding: <I> : R'' IV^'^'' , 
where d is the dimension of the actual state space. Let ^'t and 

denote two different projections of the time delay embedding 
from m + q coordinates to m coordinates. 'I't is just the map 
that projects out the first m coordinates, and VPb is the projection 
that corresponds to bleaching the time series. Let and be 
the Jacobians of these maps; in general they will depend on the 
location in state space. 

From and D^, a "distortion" matrix can be defined [50, 
Eq. (75)] 

^= [D^'^D^'^iD^D^'^y^D^D^y^ (29) 



and the distortion itself is given by (5 = A/Trace(S). Casdagli et 
al. [50] have noted that if \[' is invertible, there will be no effect 
at all on distortion. However, even if the filter is invertible, the 
matrix ^ is still a projection, and it is not invertible. 

We define the stretching matrix simply as the i nverse of the 
distortion matrix, and so the stretching itself is 7 = ■y/Trace(E~i). 
Note that while the distortion is sensitive to large eigenvalues of 
E, the stretching is sensitive to small eigenvalues of E. A more 
comprehensive theory might consider the full eigenvalue spectrum. 
Note also, in comparison with Eq. (83) of Ref. [50], that this is a 
local quantity that appears related to estimation error. 

In Fig. 10, we compare the distortion for the embeddings of 
a Hcnon time series bleached at increasing levels of q. We again 
remark that an embedding dimension of m = 3 is sufficient for all 
finite values of q because the Henon attractor has a dimension d w 
1.3, and m > 2d. It appears from these figures that bleaching does 
not induce considerable distortion, but that it does a phenomenal 
amount of stretching. 

Another way of looking at what is happening can be seen in 
Fig. 11. Here, distances between pairs of residuals (Asij = \ei—ej\) 
are plotted as a scatterplot against the corresponding distances 
for the original time series {Axij = \xi — Xj\). Again, we use 
an embedding dimension of m=3, so that the attractors do not 
overlap themselves. We see a large population of points for which 
Ae Ax; these are pairs of points which are close in the origi- 
nal coordinates, but have been stretched far apart in the residual 
coordinates. 




1 2 3 4 5 6 

q 

FIG. 10. Mean (□) and maximum (O) distortion, and mean 
stretching (o) as a function of bleaching order g, for an m = 3 
embedding of the Henon time series. The effect of bleaching on 
distortion is quite small; on average it is very near unity, and at 
the few points where the effect is maximal, it is only of order 
ten. The average stretching, by contrast (compare circles (o) with 
squares (□)), increases dramatically with q. 

VII CONCLUDING REMARKS 

In a variety of numerical experiments, we have described the ill 
effects of bleaching on nonlinear models of chaotic time series data. 
We have shown in particular that for detecting nonlineaxity, it is 
often better to compare the given time series with stochastic data 

that mimics its autocorrelation than to try and subtract out the 
autocorrelation altogether. This led us to suggest modifications 
to some standard residual-based statistics, among them the BDS 
and the McLeod-Li statistics. From the point of view of model 
building, we have seen that fitting of residuals can cost several 
orders of magnitude in accuracy of fit, compared to fitting the 
original data. On the other hand, having demonstrated cases where 
linear prefiltering is disadvantageous, we have also seen cases where 
some linear filtering helps. 

We have also done experiments with the correlation dimension, 
and while these results are not shown (but see Sauer and Yorke [19] 
for a demonstration of how linear filtering can affect estimates of 
correlation dimension), these estimates are also seriously degraded 
by the effects of bleaching. Although we have not done the relevant 
numerical experiments, we suspect that indiscriminate bleaching 
will have a similarly deleterious effect on estimates of Lyapunov 
exponent, or upon the tests for determinism advocated by Cas- 
dagli [54] and Kaplan [55]. 

Brock [6] has noted that residual-based statistics "may mis- 
identify deterministic chaos as random noise in a short data set," 
but chose to use a residual-based statistic in his study for reasons 
that were to some extent motivated by the considerable interest at 
the time in AR(2) models with roots near the unit circle [56]; for 
such systems, as we noted in Sect. V.A, pre-whitening can help. 
More recent modeling by Brock et al. [57] used a direct resampling 
(surrogate data) method for rejecting a variety of null hypothe- 
ses. While these conclusions suggest that earlier failures to detect 
nonlinearity in various economic time series may be vulnerable to 
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FIG. 11. The distance between a pair of points in the residual 
embedding (Ae) is plotted against the distance between the same 
pair of points in the original space (Ax). Note that there are many 
points for which the distance in the residual embedding space is 
much larger than the corresponding distance in the original embed- 
ding, i.e., Ae S> Ax. There are relatively fewer points for which 
Ae <^ Ax, which suggests that stretching, and not distortion, is 
the dominant factor in this case. The time series is from the Henon 
map and the embedding dimension is m = 3. (a) Only a single 
bleaching term, q = 1. (b) Full bleaching, q = 6. 



more powerful tests that arc not based on residuals, we consider 
this unlikely, because our tests are more powerful when the alterna- 
tive hypothesis is chaos, and we have seen no convincing evidence 
of chaos in financial time series. On the other hand, we are say- 
ing that if chaos is the alternative hypothesis, then residual based 
statistics are probably not as powerful as direct comparisons with 
similarly autocorrelated (surrogate) data. 

Scargle [58] has suggested that a kind of nonlinear Wold de- 
composition theorem can be derived in which the chaotic process 
is rewritten as a linear filter of "white chaos." This uncorrelated 
process is just the residual time series et of Eq. (2) , and our main 
point in this article is that the residual time series can be much 
more complicated and difficult to work with than the raw time 
series. White chaos pays a price for its whiteness. Actually, the 
algorithm Scargle used for determining the chaotic innovation was 
more complicated than that of Eq. (2), and in a later paper [59], he 
recognizes that this algorithm does not in general produce a time 
series that is in fact uncorrelated. We do not know if the effective 
prefiltcr that Scargle ultimately proposes is in general beneficial or 
detrimental to nonlinear modeling of the time series. 

Sugihara and May [60] have noted that their test for chaos 
based on prediction error can be fooled by autocorrelated noise, 
and they suggest first-differencing as a method of removing auto- 
correlation. Although this may be useful in some cases, we argue 
that this general approach is likely to be problematic on several 
counts. One, first differencing does not necessarily remove auto- 
correlation, and in some cases can enhance it; two, in cases where 
autocorrelation is not removed, the test is still vulnerable to lin- 
ear artifacts; and three, even if the autocorrelation is significantly 
removed, the state space structure can become significantly dis- 
torted, and the power of the test for detecting nonlinearity (let 
alone chaos) will have been compromised. 

APPENDIX: DEMONSTRATION THAT BEST FIT RESIDU- 
ALS ARE WHITE 

In this appendix, we show that in the limit q —^ oo, the best linear 
fit leads to uncorrelated residuals. 

Let Xt be the "best" linear estimator for Xt, in the sense of 



minimizing the variance (et) of the residuals, where et 
Consider an arbitrary time delay t > 0, and let 

_ {etCt-r) 



Xt — Xt- 



(30) 



Since we want to show that the residuals are uncorrelated, what we 
want to show is that A = 0. Our approach will be to show is that if 
A 0, then a better linear estimator than xt can be constructed, 
contradicting the hypothesis that xt was optimal. 

Begin by noting that a good estimator for et is given by 

it = Xet-T (31) 
so that a new linear estimator for xt can be defined by 



Xt = xt + it 

= Xt + X{xt-T - Xt-r)- 



(32) 



Note that this too is an ordinary linear estimator for xt in terms of 
past values {xt-i, ■ ■ ■)■ Note also, that if x were restricted to finite 
order q, then x would be of order t + q, so this argument does 
not apply to finite estimators, except through a separate result 
which we will not show here (see, for instance, Theorem 7.6.6 in 
Anderson [48]) that finite-order estimators approximate infinite- 
order estimators as g ^ oo. 
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Let et be the residuals from this new estimator: et = Xt — Xt- 
Then 

{tt) = {{xt~Xtf) 

= {{xt — [xt + et])^) 

= ({[x,-xt]-etr) 

= {{et - Xet-rf) 

= (e?)-2A(etet-r>+A'(e?_r> 

= (1-A')(e?). (33) 

Prom our original hypothesis that x was the optimum linear pre- 
dictor, we have {e^) < {et), which requires A = 0, and implies 
that {etet-r) is zero for all t > 0. That is, the residuals have no 
autocorrelation; they are white. 

ACKNOWLEDGEMENTS 

We are pleased to acknowledge Bryan GaJdrikian, Andre Longtin, 
and Doyne Farmer, who collaborated with us in developing the 
method of surrogate data. Wc arc also grateful to Blake LeBaron, 
William Brock, Tim Sauer, and Lou Pecora for many useful dis- 
cussions. This work was partially supported by the National In- 
stitute for Mental Health under Grant No. 1-R01-MH47184, and 
performed under the auspices of the U.S. Department of Energy. 

REFERENCES 

1. A. I. McLcod and W. K. Li, "Diagnostic checking ARMA time se- 
ries models using squared-residual autocorrelations," J. Time Series 
Anal. 4, 269-273 (1983). 

2. D. M. Kccnan, "A Tukcy nonadditivity-typc test for time series 
nonlincarity," Biometrika 72, 39-44 (1985). 

3. R. S. Tsay, "Nonlinearity tests for time series," Biometrika 73, 461— 
466 (1986). 

4. H. Tong, Non-linear Time Series: A Dynamical System Approach. 
(Clarendon Press, Oxford, 1990). 

5. W. A. Brock, W. D. Dechert, and J. Sclicinkman, "A test for in- 
dependence based on the correlation dimension," Technical Report 
8702, Social Systems Research Institute, University of Wisconsin, 
Madison (1987). 

6. W. A. Brock, "Distinguishing random and deterministic systems," 
J. Econ. Theo. 40, 168-195 (1986). 

7. W. A. Brock, D. A. Hsieh, and B. LeBaron, Nonlinear Dynamics, 
Chaos, and Instability: Statistical Theory and Economic Evidence. 
(MIT Press, Cambridge, MA, 1991). 

8. D. A. Hsich, "Testing for nonlinear dependence in daily foreign 
exchange rate changes," J. Business 62, 339—368 (1989). 

9. D. A. Hsich, "Chaos and nonlinear dynamics: application to finan- 
cial markets," J. Finance 46, 1839-1877 (1991). 

10. T.-H. Lee, H. White, and C. W. J. Granger, "Testing for neglected 
nonlinearity in time series models: A comparison of neural net- 
work methods and alternative tests," J. Econometrics 56, 269-290 
(1993). 

11. R. B. Blackmail and ,1. W. Tukey, The Measurement of Power Spec- 
tra. (Dover, New York, 1959). 

12. J. Theiler, "Spurious dimension from correlation algorithms applied 
to limited time series data," Phys. Rev. A 34, 2427-2432 (1986). 

13. A. R. Osborne and A. Provenzale, "Finite correlation dimension for 
stochastic systems with power-law spectra," Physica D 35, 357-381 
(1989). 

14. A. Provenzale, A. R. Osborne, and R. Soj, "Convergence of the K2 
entropy for random noises with power law spectra," Physica D 47, 
361-372 (1991). 

15. J. Theiler, "Some comments on the correlation dimension of 1//" 
noise," Phys. Lett. A 155, 480-493 (1991). 

16. P. E. Rapp, A. M. Albano, T. I. Schmah, and L. A. Farwell, "Fil- 
tered noise can mimic low dimensional chaotic attra<;tors," Phys. 
Rev. E 47, 2289-2297 (1993). 



17. T. Sauer, J. A. Yorke, and M. Casdagli, "Embedology," J. Stat. 
Phys. 65, 579-616 (1991). 

18. D. S. Broomhcad, J. P. Huke, and M. R. Muldoon, "Linear filters 
and nonlinear systems," J. R. Stat. Soc. B 54, 373-382 (1992). 

19. T. Sauer and J. A. Yorke, "How many delay coordinates do you 
need?" Int. J. Bifurcation Chaos (1993). To appear. 

20. S. H. Isabelle, A. V. Oppenheim, and G. W. Wornell, "Effects of 
convolution on chaotic signals." Preprint, MIT Research Laboratory 
of Electronics. 

21. L. M. Pecora and T. L. Carroll, "Attractor reconstruction, filtering, 
and ill-posed problems." Preprint, Naval Research Laboratory. 

22. R. Badii, G. Broggi, B. Derighetti, M. Ravani, S. CiUberto, 
A. Politi, and M. A. Rubio, "Dimension increase in filtered chaotic 
signals," Phys. Rev. Lett. 60, 979 (1988). 

23. F. Mitschke, M. Moller, and W. Lange, "Measuring filtered chaotic 
signals," Phys. Rev. A 37, 4518-4521 (1988). 

24. F. Mitschke, "Acausal filters for chaotic signals," Phys. Rev. A 41, 
1169-1171 (1990). 

25. J. Theiler, B. Galdrikian, A. Longtin, S. Eubank, and J. D. Farmer, 
"Using surrogate data to detect nonlinearity in time scries," in Non- 
linear Modeling and Forecasting. Proceedings of the Workshop held 
September, 1990, in Santa Fe, New Mexico, M. Casdagli and S. Eu- 
bank, eds. Vol. XII of SFI Studies in the Sciences of Complexity, 
(Addison- Wesley 1992), pp. 163-188. 

26. M. Hcnon, "A two-dimensional mapping with a strange attractor," 
Comm. Math. Phys. 50, 69-77 (1976). 

27. H. Akaike, "A new look at the statistical model identification," 
IEEE Trans. Auto. Control 19, 716-723 (1974). 

28. G. Schwarz, "Estimating the dimension of a model," Ann. Stat. 6, 
461-464 (1978). 

29. J. D. Farmer and J. J. Sidorowich, "Predicting chaotic time series," 
Phys. Rev. Lett. 59, 845-848 (1987). 

30. P. Grassberger, "On the fractal dimension of the Henon attraotor," 
Phys. Lett. A 97, 224-226 (1983). 

31. H. D. I. Abarbanel, R. Brown, and M. B. Kennel, "Lyapunov ex- 
ponents in chaotic systems: their importance and their evaluation 
using observed data," Int. J. Mod. Phys. B 5, 1347-1375 (1991). 

32. D. Nychka, S. EUner, A. R. Gallant, and D. McCaffrey, "Finding 
ciiaos in noisy systems," J. R. Stat. Soc. B 54, 399-426 (1992). 

33. D. T. Kaplan and R. ,1. Cohen, "Is fibrillation ciiaos?" Circulation 
Res. 67, 886-892 (1990). 

34. R. S. Tsay, "Model checking via parametric ijootstraps in time series 
analysis," Appl. Stat. 41, 1-15 (1992). 

35. L. A. Smith, "Identification and prediction of low dimensional dy- 
namics," Physica D 58, 50-76 (1992). 

36. J. Theiler, S. Eubank, A. Longtin, B. Galdrikian, and J. D. Farmer, 
"Testing for nonlinearity in time series: the method of surrogate 
data," Physica D 58, 77-94 (1992). 

37. J. Theiler, P. S. Linsay, and D. M. Rubin, "Detecting nonlinear- 
ity in data with long coherence times," in Time Series Prediction: 
Forecasting the Future and Understanding the Past. Proceedings of 
the Workshop held May, 1992, in Santa Fe, New Mexico, A. S. 
Weigend and N. A. Gershenfeld, eds. Vol. XVII of SFI Studies in 
the Sciences of Complexity, (Addison- Wesley, 1993), pp. 429-455. 

38. B. Efron and R. Tsibirani, "Bootstrap methods for standard errors, 
confidence intervals, and other measures of statistical accuracy," 
Stat. Sci. 1, 54-77 (1986). 

39. P. Grassberger and I. Procaccia, "Characterization of strange at- 
tractors," Phys. Rev. Lett. 50, 346-349 (1983). 

40. Wc chose this version of McLcod-Li as a representative of the "con- 
ventional" approach to detecting nonlinearity because it is simple 
to understand and implement; there are better approaches (see 
Tong [4]) but these tend to involve massive regressions. Our purpose 
here is meant to be more illustrative than comparative. 

41. We are grateful to William Brock for suggesting this example. 

42. M. Berge, "Quantification of chaos in a time series of water levels." 
Unpublished (1990). 

43. B. Townshend, "Nonlinear prediction of speech signals," in Non- 
linear Modeling and Forecasting. Proceedings of the Workshop held 
September, 1990, in Santa Fe, New Mexico, M. Casdagli and S. Eu- 
bank, eds. Vol. XII of SFI Studies in the Sciences of Complexity, 
(Addison- Wesley 1992), pp. 435-456. 



11 



44. E. N. Lorcnz, "Deterministic nonperiodic flow," J. Atmos. Sci. 20, 

130-141 (1963). 

45. O. E. Rosslcr, "An equation for continuous chaos," Phys. Lett. A 
57, 397-398 (1976). 

46. J. L. Kaplan and J. A. Yorkc, "Chaotic behavior of multidimen- 
sional difference equations," in Functional Differential Equations 
and Approximations of Fixed Points, If.-O. Peitgen and If.-O. 
Walther, eds. Vol. 730 of Springer Lecture Notes in Mathematics, 
(Springer- Verlag, BerUn, 1979), p. 204. 

47. The effects of acausal filters are discussed by Mitschke [24] and by 
Pecora and Carroll [21]. 

48. T. W. Anderson, The Statistical Analysis of Time Series. (Wiley, 
New York, 1971). 

49. A. Cenys, "Lyapunov spectrum of the maps generating identical 
attractors," Europhys. Lett. 21, 407-411 (1993). 

50. M. Casdagli, S. Eubank, J. D. Farmer, and J. Gibson, "State 
space reconstruction in the presence of noise," Physica D 51, 52-98 
(1991). 

51. A. M. Fraser and H. L. Swinney, "Independent coordinates for 
strange attractors from mutual information," Phys. Rev. A 33, 

1134-1140 (1986). 

52. W. Liebert and H. G. Schuster, "Proper choice of the time delay 
for the analysis of chaotic time series," Phys. Lett. A 142, 107-111 
(1988). 

53. Z. Aleksic, "Estimating the embedding dimension," Physica D 52, 
362-368 (1991). 

54. M. Casdagli, "Chaos and deterministic versus stochastic nonlinear 
modeUng," J. R. Stat. Soc. B 54, 303-328 (1992). 

55. D. T. Kaplan and L. Glass, "Direct test for determinism," Phys. 
Rev. Lett. 68, 427-430 (1992). 

56. W. A. Brock. Personal communication. 

57. W. A. Brock, J. Lakonishok, and B. LeBaron, "Simple technical 
trading rules and the stochastic properties of stock returns," J. 
Finance 47, 1731-1764 (1992). 

58. J. D. Scargle, "An introduction to chaotic and random time series 
analysis," Int. J. Imaging Sy st. Tech. 1, 243-253 (1989). 

59. J. D. Scargle, "Studies in astronomical time series analysis: IV. 
Modeling chaotic and random processes with linear filters," Astro- 
phys. ,7. 359, 469-482 (1990). 

60. G. Sugihara and R. May, "Nonlinear forecasting as a way of dis- 
tinguishing chaos from measurement error in forecasting," Nature 
344, 734-741 (1990). 



12 



